-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL][CUDA][libclc] Add bf16 builtins and optimize half builtins for fma, fmin, fmax and fmax #5724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adds support for the following builtins: abs, neg: - .bf16, - .bf16x2 min, max - {.ftz}{.NaN}{.xorsign.abs}.f16 - {.ftz}{.NaN}{.xorsign.abs}.f16x2 - {.NaN}{.xorsign.abs}.bf16 - {.NaN}{.xorsign.abs}.bf16x2 - {.ftz}{.NaN}{.xorsign.abs}.f32 Differential Revision: https://reviews.llvm.org/D117887
This patch adds builtins/intrinsics for the following variants of FMA: NOTE: follow-up commit with the missing clang-side changes. - f16, f16x2 - rn - rn_ftz - rn_sat - rn_ftz_sat - rn_relu - rn_ftz_relu - bf16, bf16x2 - rn - rn_relu ptxas (Cuda compilation tools, release 11.0, V11.0.194) is happy with the generated assembly. Differential Revision: https://reviews.llvm.org/D118977
NOTE: this is a follow-up commit with the missing clang-side changes. This patch adds builtins and intrinsics for the f16 and f16x2 variants of the ex2 instruction. These two variants were added in PTX7.0, and are supported by sm_75 and above. Note that this isn't wired with the exp2 llvm intrinsic because the ex2 instruction is only available in its approx variant. Running ptxas on the assembly generated by the test f16-ex2.ll works as expected. Differential Revision: https://reviews.llvm.org/D119157
Apply review suggestions. Co-authored-by: Alexey Bader <alexey.bader@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libclc changes look good to me.
I just removed the changes to native_exp2, as that is being implemented in a slightly different way in #5747. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FE changes LGTM As per comment: #5724 (comment)
This change (everything in clang folder) are already merged upstream. I just added them to this PR as they are required to build it. They will be part of the next pulldown.
This PR also contains some changes (everything in clang folder) that have been merged in upstream llvm since last pulldown and are required for building it.
@t4c1, could you please add upstream link?
Done - updated the PR description. |
For functions fma, fmin, fmax and fmax adds bf16 builtins to libclc and optimizes half builtins to use half instructions if supported by the device.
This PR also contains some changes (everything in clang folder) that have been merged in upstream llvm since last pulldown and are required for building it. There are parts of (something went wrong when merging these, so only parts were merged at first. The changes in this PR are the remainder): https://reviews.llvm.org/D118977 https://reviews.llvm.org/D117887 https://reviews.llvm.org/D119157
Tests for half changes are in intel/llvm-test-suite#880. Tests for bf16 implementations will be added together with adding support for these to runtime in future PRs.